AITopics | linear contextual bandit problem

Collaborating Authors

linear contextual bandit problem

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Smoothed Analysis of the Greedy Algorithm for the Linear Contextual Bandit Problem

Neural Information Processing SystemsNov-20-2025, 21:58:27 GMT

Bandit learning is characterized by the tension between long-term exploration and short-term exploitation. However, as has recently been noted, in settings in which the choices of the learning algorithm correspond to important decisions about individual people (such as criminal recidivism prediction, lending, and sequential drug trials), exploration corresponds to explicitly sacrificing the well-being of one individual for the potential future benefit of others. In such settings, one might like to run a ``greedy'' algorithm, which always makes the optimal decision for the individuals at hand --- but doing this can result in a catastrophic failure to learn. In this paper, we consider the linear contextual bandit problem and revisit the performance of the greedy algorithm. We give a smoothed analysis, showing that even when contexts may be chosen by an adversary, small perturbations of the adversary's choices suffice for the algorithm to achieve ``no regret'', perhaps (depending on the specifics of the setting) with a constant amount of initial training data. This suggests that in slightly perturbed environments, exploration and exploitation need not be in conflict in the linear setting.

greedy algorithm, linear contextual bandit problem, smoothed analysis, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.90)
Information Technology > Artificial Intelligence > Machine Learning (0.77)
Information Technology > Data Science > Data Mining > Big Data (0.66)

Add feedback

A Smoothed Analysis of the Greedy Algorithm for the Linear Contextual Bandit Problem

Neural Information Processing SystemsOct-8-2024, 15:12:07 GMT

Bandit learning is characterized by the tension between long-term exploration and short-term exploitation. However, as has recently been noted, in settings in which the choices of the learning algorithm correspond to important decisions about individual people (such as criminal recidivism prediction, lending, and sequential drug trials), exploration corresponds to explicitly sacrificing the well-being of one individual for the potential future benefit of others. In such settings, one might like to run a greedy'' algorithm, which always makes the optimal decision for the individuals at hand --- but doing this can result in a catastrophic failure to learn. In this paper, we consider the linear contextual bandit problem and revisit the performance of the greedy algorithm. We give a smoothed analysis, showing that even when contexts may be chosen by an adversary, small perturbations of the adversary's choices suffice for the algorithm to achieve no regret'', perhaps (depending on the specifics of the setting) with a constant amount of initial training data. This suggests that in slightly perturbed environments, exploration and exploitation need not be in conflict in the linear setting.

greedy algorithm, linear contextual bandit problem, smoothed analysis, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.90)
Information Technology > Artificial Intelligence > Machine Learning (0.82)
Information Technology > Data Science > Data Mining > Big Data (0.65)

Add feedback

Reviews: A Smoothed Analysis of the Greedy Algorithm for the Linear Contextual Bandit Problem

Neural Information Processing SystemsOct-7-2024, 07:23:51 GMT

This paper is about contextual linear bandits. The authors are trying to understand the phenomenon observed in practice that exploration seems much less important than the theory usually suggests. To make a start understanding this they analyze the greedy policy that chooses the arm for which the inner product with the least squares estimator and the context is largest. With no further assumptions this leads to linear regret. The authors make a'perturbation' assumption where the contexts are first chosen by an adversary and then perturbed using zero-mean noise. Under carefully chosen assumptions on the noise they show that the greedy algorithm now enjoys O(sqrt(T)) regret. The two standard settings are studied.

assumption, delta, sqrt, (13 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.61)
Information Technology > Data Science > Data Mining > Big Data (0.40)

Add feedback

Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits

Zhang, Zihan, Ji, Xiangyang, Zhou, Yuan

arXiv.org Artificial IntelligenceOct-15-2022

We study the optimal batch-regret tradeoff for batch linear contextual bandits. For any batch number $M$, number of actions $K$, time horizon $T$, and dimension $d$, we provide an algorithm and prove its regret guarantee, which, due to technical reasons, features a two-phase expression as the time horizon $T$ grows. We also prove a lower bound theorem that surprisingly shows the optimality of our two-phase regret upper bound (up to logarithmic factors) in the \emph{full range} of the problem parameters, therefore establishing the exact batch-regret tradeoff. Compared to the recent work \citep{ruan2020linear} which showed that $M = O(\log \log T)$ batches suffice to achieve the asymptotically minimax-optimal regret without the batch constraints, our algorithm is simpler and easier for practical implementation. Furthermore, our algorithm achieves the optimal regret for all $T \geq d$, while \citep{ruan2020linear} requires that $T$ greater than an unrealistically large polynomial of $d$. Along our analysis, we also prove a new matrix concentration inequality with dependence on their dynamic upper bounds, which, to the best of our knowledge, is the first of its kind in literature and maybe of independent interest.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2110.08057

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.47)

Add feedback

A Smoothed Analysis of the Greedy Algorithm for the Linear Contextual Bandit Problem

Kannan, Sampath, Morgenstern, Jamie H., Roth, Aaron, Waggoner, Bo, Wu, Zhiwei Steven

Neural Information Processing SystemsFeb-14-2020, 10:14:39 GMT

Bandit learning is characterized by the tension between long-term exploration and short-term exploitation. However, as has recently been noted, in settings in which the choices of the learning algorithm correspond to important decisions about individual people (such as criminal recidivism prediction, lending, and sequential drug trials), exploration corresponds to explicitly sacrificing the well-being of one individual for the potential future benefit of others. In such settings, one might like to run a greedy'' algorithm, which always makes the optimal decision for the individuals at hand --- but doing this can result in a catastrophic failure to learn. In this paper, we consider the linear contextual bandit problem and revisit the performance of the greedy algorithm. We give a smoothed analysis, showing that even when contexts may be chosen by an adversary, small perturbations of the adversary's choices suffice for the algorithm to achieve no regret'', perhaps (depending on the specifics of the setting) with a constant amount of initial training data.

greedy algorithm, linear contextual bandit problem, smoothed analysis, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.90)
Information Technology > Artificial Intelligence > Machine Learning (0.87)
Information Technology > Data Science > Data Mining > Big Data (0.65)

Add feedback